class: center, middle, inverse, title-slide # Open Science for a better World ## An introduction to the main tools ### Fabio CRUZ ### Université de Lorraine ### 2016/12/12 (updated: 2021-01-31) --- ``` ## Loading required package: xaringanExtra ``` ## Music Vs. Research .pull-left[ <img src="data:image/png;base64,#images/Musica.jpeg" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#images/Paper.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Music Vs. Research .pull-left[ <img src="data:image/png;base64,#images/excel-chaos.jpg" width="80%" style="display: block; margin: auto;" /> ] .pull-rigth[ <img src="data:image/png;base64,#images/Paper.png" width="100%" style="display: block; margin: auto;" /> ] --- # Main goal - Understand the importance of the *replication principle* in research - Create a first dynamic document using a *Literate programming approach* --- # The document pipeline <img src="data:image/png;base64,#images/Article-pipeline-1.png" width="100%" style="display: block; margin: auto;" /> --- # The document pipeline <img src="data:image/png;base64,#images/Article-pipeline-2.png" width="90%" style="display: block; margin: auto;" /> How to describe in detail this section for Research & Industry purposes --- # Reproducibility and Replicability **Reproducibility**: Refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original researcher (Goodman, Fanelli, and Ioannidis 2016). - Focuses on the validity of the data analysis - "Can we trust this analysis?" .footnote[ Goodman, Steven N., Daniele Fanelli, and John P. A. Ioannidis. 2016. “What Does Research Reproducibility Mean?” Science Translational Medicine 8 (341): 341ps12–341ps12. https://doi.org/10.1126/scitranslmed.aaf5027. ] -- **Replicability:** This is the act of repeating an entire study, independently of the original investigator without the use of original data (but generally using the same methods). - Important for policymakers and regulatory decisions --- ## Why do we need Reproducible Research? - Avoid misconduct such as fraudulent data and plagiarism - Data-intensive research (e.g Big data research) - Distributed research <img height="270px" class="plain" src="images/Problem-1.png"> <img height="270px" class="plain" src="images/Problem-3.png"> <img height="270px" class="plain" src="images/Problem-2.png"> --- background-image: url("data:image/png;base64,#https://images-na.ssl-images-amazon.com/images/I/41KSVC8Q2JL.jpg") background-position: 90% 50% background-size: 30% ## Reproducibility concepts Two key elements: - **Literate programming for enabling reproducibilty** - Version control for enhancing transparency *...for significantly better documentation of programs, <br>and that we can best achieve <br>this by considering programs to be works of literature.* .footnote[ D. E. Knuth, Literate Programming, The Computer Journal, Volume 27, Issue 2, 1984, Pages 97–111, https://doi.org/10.1093/comjnl/27.2.97 ] --- ## Literate programming for enabling reproducibilty *Literate programming refers to the use of a computing environment for authoring documents that contain a mix of natural (eg. English) and computer (eg. R) languages (Schulte et al. 2012)* <img src="data:image/png;base64,#images/Word-excel.jpg" width="80%" style="display: block; margin: auto;" /> .footnote[ Schulte, Eric, Dan Davison, Thomas Dye, and Carsten Dominik. 2012. “A Multi-Language Computing Environment for Literate Programming and Reproducible Research.” Journal of Statistical Software 46 (1): 1–24. https://doi.org/10.18637/jss.v046.i03.] --- ## Literate programming for enabling reproducibilty *Literate programming refers to the use of a computing environment for authoring documents that contain a mix of natural (eg. English) and computer (eg. R) languages (Schulte et al. 2012)* <img src="data:image/png;base64,#images/rstudio.png" width="70%" style="display: block; margin: auto;" /> --- ## What is R/RStudio? - R is a statistical programming language - RStudio is a convenient interface for R (an integrated development environment, IDE) .footnote[ Schulte, Eric, Dan Davison, Thomas Dye, and Carsten Dominik. 2012. “A Multi-Language Computing Environment for Literate Programming and Reproducible Research.” Journal of Statistical Software 46 (1): 1–24. https://doi.org/10.18637/jss.v046.i03.] --- ## Rmarkdown <img src="data:image/png;base64,#images/Rmarkdown.png" width="70%" style="display: block; margin: auto;" /> - [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) --- ## Github 1. Open source 2. Control version <img src="data:image/png;base64,#https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="50%" style="display: block; margin: auto;" /> --- # Summary - Reproducible research is important as a **minimum standard**, particularly for studies that are difficult to replicate - Infrastructure is needed for creating and distributing reproducible documents, beyond what is currently available - There is a growing number of tools for creating reproducible documents **Some challengues** - It is not the solution for everyone. --- # Main goal of the workshop - Create a first reproducible article <img src="data:image/png;base64,#images/Phd-comics.gif" width="100%" style="display: block; margin: auto;" /> --- class: center, middle # Thanks!